HW1

HW1

Assignment Description

We will work with air pollution data from the U.S. Environmental Protection Agency (EPA). The EPA has a national monitoring network of air pollution sites that The primary question you will answer is whether daily concentrations of PM2.5 (particulate matter air pollution with aerodynamic diameter less than 2.5 \(\mu\)m) have decreased in California over the last 20 years (from 2002 to 2022).

Steps

  1. Given the formulated question from the assignment description, you will now conduct EDA Checklist items 2-4. First, download 2002 and 2022 data for all sites in California from the EPA Air Quality Data website. Read in the data using data.table(). For each of the two datasets, check the dimensions, headers, footers, variable names and variable types. Check for any data issues, particularly in the key variable we are analyzing. Make sure you write up a summary of all of your findings.

    library(data.table)
    Otwo <- data.table::fread("/Users/vikaskunta/Downloads/2002ad_viz_plotval_data.csv")
    twotwo <- data.table::fread("/Users/vikaskunta/Downloads/2022ad_viz_plotval_data.csv")
    dim(Otwo)
    [1] 15976    20
    nrow(Otwo)
    [1] 15976
    ncol(Otwo)
    [1] 20
    dim(twotwo)
    [1] 57775    20
    nrow(twotwo)
    [1] 57775
    ncol(twotwo)
    [1] 20
    head(Otwo)
             Date Source  Site ID POC Daily Mean PM2.5 Concentration    UNITS
    1: 01/05/2002    AQS 60010007   1                           25.1 ug/m3 LC
    2: 01/06/2002    AQS 60010007   1                           31.6 ug/m3 LC
    3: 01/08/2002    AQS 60010007   1                           21.4 ug/m3 LC
    4: 01/11/2002    AQS 60010007   1                           25.9 ug/m3 LC
    5: 01/14/2002    AQS 60010007   1                           34.5 ug/m3 LC
    6: 01/17/2002    AQS 60010007   1                           41.0 ug/m3 LC
       DAILY_AQI_VALUE Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
    1:              78 Livermore               1              100
    2:              92 Livermore               1              100
    3:              71 Livermore               1              100
    4:              80 Livermore               1              100
    5:              98 Livermore               1              100
    6:             115 Livermore               1              100
       AQS_PARAMETER_CODE       AQS_PARAMETER_DESC CBSA_CODE
    1:              88101 PM2.5 - Local Conditions     41860
    2:              88101 PM2.5 - Local Conditions     41860
    3:              88101 PM2.5 - Local Conditions     41860
    4:              88101 PM2.5 - Local Conditions     41860
    5:              88101 PM2.5 - Local Conditions     41860
    6:              88101 PM2.5 - Local Conditions     41860
                               CBSA_NAME STATE_CODE      STATE COUNTY_CODE  COUNTY
    1: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
    2: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
    3: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
    4: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
    5: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
    6: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
       SITE_LATITUDE SITE_LONGITUDE
    1:      37.68753      -121.7842
    2:      37.68753      -121.7842
    3:      37.68753      -121.7842
    4:      37.68753      -121.7842
    5:      37.68753      -121.7842
    6:      37.68753      -121.7842
    tail(Otwo)
             Date Source  Site ID POC Daily Mean PM2.5 Concentration    UNITS
    1: 12/10/2002    AQS 61131003   1                             15 ug/m3 LC
    2: 12/13/2002    AQS 61131003   1                             15 ug/m3 LC
    3: 12/22/2002    AQS 61131003   1                              1 ug/m3 LC
    4: 12/25/2002    AQS 61131003   1                             23 ug/m3 LC
    5: 12/28/2002    AQS 61131003   1                              5 ug/m3 LC
    6: 12/31/2002    AQS 61131003   1                              6 ug/m3 LC
       DAILY_AQI_VALUE            Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
    1:              57 Woodland-Gibson Road               1              100
    2:              57 Woodland-Gibson Road               1              100
    3:               4 Woodland-Gibson Road               1              100
    4:              74 Woodland-Gibson Road               1              100
    5:              21 Woodland-Gibson Road               1              100
    6:              25 Woodland-Gibson Road               1              100
       AQS_PARAMETER_CODE       AQS_PARAMETER_DESC CBSA_CODE
    1:              88101 PM2.5 - Local Conditions     40900
    2:              88101 PM2.5 - Local Conditions     40900
    3:              88101 PM2.5 - Local Conditions     40900
    4:              88101 PM2.5 - Local Conditions     40900
    5:              88101 PM2.5 - Local Conditions     40900
    6:              88101 PM2.5 - Local Conditions     40900
                                     CBSA_NAME STATE_CODE      STATE COUNTY_CODE
    1: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
    2: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
    3: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
    4: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
    5: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
    6: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
       COUNTY SITE_LATITUDE SITE_LONGITUDE
    1:   Yolo      38.66121      -121.7327
    2:   Yolo      38.66121      -121.7327
    3:   Yolo      38.66121      -121.7327
    4:   Yolo      38.66121      -121.7327
    5:   Yolo      38.66121      -121.7327
    6:   Yolo      38.66121      -121.7327
    head(twotwo)
             Date Source  Site ID POC Daily Mean PM2.5 Concentration    UNITS
    1: 01/01/2022    AQS 60010007   3                           12.7 ug/m3 LC
    2: 01/02/2022    AQS 60010007   3                           13.9 ug/m3 LC
    3: 01/03/2022    AQS 60010007   3                            7.1 ug/m3 LC
    4: 01/04/2022    AQS 60010007   3                            3.7 ug/m3 LC
    5: 01/05/2022    AQS 60010007   3                            4.2 ug/m3 LC
    6: 01/06/2022    AQS 60010007   3                            3.8 ug/m3 LC
       DAILY_AQI_VALUE Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
    1:              52 Livermore               1              100
    2:              55 Livermore               1              100
    3:              30 Livermore               1              100
    4:              15 Livermore               1              100
    5:              18 Livermore               1              100
    6:              16 Livermore               1              100
       AQS_PARAMETER_CODE       AQS_PARAMETER_DESC CBSA_CODE
    1:              88101 PM2.5 - Local Conditions     41860
    2:              88101 PM2.5 - Local Conditions     41860
    3:              88101 PM2.5 - Local Conditions     41860
    4:              88101 PM2.5 - Local Conditions     41860
    5:              88101 PM2.5 - Local Conditions     41860
    6:              88101 PM2.5 - Local Conditions     41860
                               CBSA_NAME STATE_CODE      STATE COUNTY_CODE  COUNTY
    1: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
    2: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
    3: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
    4: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
    5: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
    6: San Francisco-Oakland-Hayward, CA          6 California           1 Alameda
       SITE_LATITUDE SITE_LONGITUDE
    1:      37.68753      -121.7842
    2:      37.68753      -121.7842
    3:      37.68753      -121.7842
    4:      37.68753      -121.7842
    5:      37.68753      -121.7842
    6:      37.68753      -121.7842
    tail(twotwo)
             Date Source  Site ID POC Daily Mean PM2.5 Concentration    UNITS
    1: 12/01/2022    AQS 61131003   1                            3.4 ug/m3 LC
    2: 12/07/2022    AQS 61131003   1                            3.8 ug/m3 LC
    3: 12/13/2022    AQS 61131003   1                            6.0 ug/m3 LC
    4: 12/19/2022    AQS 61131003   1                           34.8 ug/m3 LC
    5: 12/25/2022    AQS 61131003   1                           23.2 ug/m3 LC
    6: 12/31/2022    AQS 61131003   1                            1.0 ug/m3 LC
       DAILY_AQI_VALUE            Site Name DAILY_OBS_COUNT PERCENT_COMPLETE
    1:              14 Woodland-Gibson Road               1              100
    2:              16 Woodland-Gibson Road               1              100
    3:              25 Woodland-Gibson Road               1              100
    4:              99 Woodland-Gibson Road               1              100
    5:              74 Woodland-Gibson Road               1              100
    6:               4 Woodland-Gibson Road               1              100
       AQS_PARAMETER_CODE       AQS_PARAMETER_DESC CBSA_CODE
    1:              88101 PM2.5 - Local Conditions     40900
    2:              88101 PM2.5 - Local Conditions     40900
    3:              88101 PM2.5 - Local Conditions     40900
    4:              88101 PM2.5 - Local Conditions     40900
    5:              88101 PM2.5 - Local Conditions     40900
    6:              88101 PM2.5 - Local Conditions     40900
                                     CBSA_NAME STATE_CODE      STATE COUNTY_CODE
    1: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
    2: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
    3: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
    4: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
    5: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
    6: Sacramento--Roseville--Arden-Arcade, CA          6 California         113
       COUNTY SITE_LATITUDE SITE_LONGITUDE
    1:   Yolo      38.66121      -121.7327
    2:   Yolo      38.66121      -121.7327
    3:   Yolo      38.66121      -121.7327
    4:   Yolo      38.66121      -121.7327
    5:   Yolo      38.66121      -121.7327
    6:   Yolo      38.66121      -121.7327
    str(Otwo)
    Classes 'data.table' and 'data.frame':  15976 obs. of  20 variables:
     $ Date                          : chr  "01/05/2002" "01/06/2002" "01/08/2002" "01/11/2002" ...
     $ Source                        : chr  "AQS" "AQS" "AQS" "AQS" ...
     $ Site ID                       : int  60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 ...
     $ POC                           : int  1 1 1 1 1 1 1 1 1 1 ...
     $ Daily Mean PM2.5 Concentration: num  25.1 31.6 21.4 25.9 34.5 41 29.3 15 18.8 37.9 ...
     $ UNITS                         : chr  "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" ...
     $ DAILY_AQI_VALUE               : int  78 92 71 80 98 115 87 57 65 107 ...
     $ Site Name                     : chr  "Livermore" "Livermore" "Livermore" "Livermore" ...
     $ DAILY_OBS_COUNT               : int  1 1 1 1 1 1 1 1 1 1 ...
     $ PERCENT_COMPLETE              : num  100 100 100 100 100 100 100 100 100 100 ...
     $ AQS_PARAMETER_CODE            : int  88101 88101 88101 88101 88101 88101 88101 88101 88101 88101 ...
     $ AQS_PARAMETER_DESC            : chr  "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" ...
     $ CBSA_CODE                     : int  41860 41860 41860 41860 41860 41860 41860 41860 41860 41860 ...
     $ CBSA_NAME                     : chr  "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" ...
     $ STATE_CODE                    : int  6 6 6 6 6 6 6 6 6 6 ...
     $ STATE                         : chr  "California" "California" "California" "California" ...
     $ COUNTY_CODE                   : int  1 1 1 1 1 1 1 1 1 1 ...
     $ COUNTY                        : chr  "Alameda" "Alameda" "Alameda" "Alameda" ...
     $ SITE_LATITUDE                 : num  37.7 37.7 37.7 37.7 37.7 ...
     $ SITE_LONGITUDE                : num  -122 -122 -122 -122 -122 ...
     - attr(*, ".internal.selfref")=<externalptr> 
    str(twotwo)
    Classes 'data.table' and 'data.frame':  57775 obs. of  20 variables:
     $ Date                          : chr  "01/01/2022" "01/02/2022" "01/03/2022" "01/04/2022" ...
     $ Source                        : chr  "AQS" "AQS" "AQS" "AQS" ...
     $ Site ID                       : int  60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 ...
     $ POC                           : int  3 3 3 3 3 3 3 3 3 3 ...
     $ Daily Mean PM2.5 Concentration: num  12.7 13.9 7.1 3.7 4.2 3.8 2.3 6.9 13.6 11.2 ...
     $ UNITS                         : chr  "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" ...
     $ DAILY_AQI_VALUE               : int  52 55 30 15 18 16 10 29 54 47 ...
     $ Site Name                     : chr  "Livermore" "Livermore" "Livermore" "Livermore" ...
     $ DAILY_OBS_COUNT               : int  1 1 1 1 1 1 1 1 1 1 ...
     $ PERCENT_COMPLETE              : num  100 100 100 100 100 100 100 100 100 100 ...
     $ AQS_PARAMETER_CODE            : int  88101 88101 88101 88101 88101 88101 88101 88101 88101 88101 ...
     $ AQS_PARAMETER_DESC            : chr  "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" ...
     $ CBSA_CODE                     : int  41860 41860 41860 41860 41860 41860 41860 41860 41860 41860 ...
     $ CBSA_NAME                     : chr  "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" ...
     $ STATE_CODE                    : int  6 6 6 6 6 6 6 6 6 6 ...
     $ STATE                         : chr  "California" "California" "California" "California" ...
     $ COUNTY_CODE                   : int  1 1 1 1 1 1 1 1 1 1 ...
     $ COUNTY                        : chr  "Alameda" "Alameda" "Alameda" "Alameda" ...
     $ SITE_LATITUDE                 : num  37.7 37.7 37.7 37.7 37.7 ...
     $ SITE_LONGITUDE                : num  -122 -122 -122 -122 -122 ...
     - attr(*, ".internal.selfref")=<externalptr> 
    summary(Otwo[,8:13])
      Site Name         DAILY_OBS_COUNT PERCENT_COMPLETE AQS_PARAMETER_CODE
     Length:15976       Min.   :1       Min.   :100      Min.   :88101     
     Class :character   1st Qu.:1       1st Qu.:100      1st Qu.:88101     
     Mode  :character   Median :1       Median :100      Median :88101     
                        Mean   :1       Mean   :100      Mean   :88215     
                        3rd Qu.:1       3rd Qu.:100      3rd Qu.:88502     
                        Max.   :1       Max.   :100      Max.   :88502     
    
     AQS_PARAMETER_DESC   CBSA_CODE    
     Length:15976       Min.   :12540  
     Class :character   1st Qu.:23420  
     Mode  :character   Median :40140  
                        Mean   :33270  
                        3rd Qu.:41740  
                        Max.   :49700  
                        NA's   :929    
    summary(twotwo[,8:13])
      Site Name         DAILY_OBS_COUNT PERCENT_COMPLETE AQS_PARAMETER_CODE
     Length:57775       Min.   :1       Min.   :100      Min.   :88101     
     Class :character   1st Qu.:1       1st Qu.:100      1st Qu.:88101     
     Mode  :character   Median :1       Median :100      Median :88101     
                        Mean   :1       Mean   :100      Mean   :88196     
                        3rd Qu.:1       3rd Qu.:100      3rd Qu.:88101     
                        Max.   :1       Max.   :100      Max.   :88502     
    
     AQS_PARAMETER_DESC   CBSA_CODE    
     Length:57775       Min.   :12540  
     Class :character   1st Qu.:31080  
     Mode  :character   Median :40140  
                        Mean   :35447  
                        3rd Qu.:41860  
                        Max.   :49700  
                        NA's   :4761   
    table(Otwo$`Daily Mean PM2.5 Concentration`)
    
        0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9     1   1.1   1.2 
        3     7    18    23    19    28    31    21    30    35    74    32    28 
      1.3   1.4   1.5   1.6   1.7   1.8   1.9     2   2.1   2.2   2.3   2.4   2.5 
       25    24    29    25    43    29    38    97    26    45    36    39    32 
      2.6   2.7   2.8   2.9     3   3.1   3.2   3.3   3.4   3.5   3.6   3.7   3.8 
       37    38    42    40   167    43    44    46    29    48    46    51    49 
      3.9     4   4.1   4.2   4.3   4.4   4.5   4.6   4.7   4.8   4.9     5   5.1 
       52   227    41    61    60    54    62    66    49    49    57   267    55 
      5.2   5.3   5.4   5.5   5.6   5.7   5.8   5.9     6   6.1   6.2   6.3   6.4 
       63    58    66    62    55    53    63    55   332    44    64    54    57 
      6.5   6.6   6.7   6.8   6.9     7   7.1   7.2   7.3   7.4   7.5   7.6   7.7 
       57    45    61    63    51   309    65    53    70    48    63    63    71 
      7.8   7.9     8   8.1   8.2   8.3   8.4   8.5   8.6   8.7   8.8   8.9     9 
       58    55   302    51    63    49    43    65    55    56    61    73   290 
      9.1   9.2   9.3   9.4   9.5   9.6   9.7   9.8   9.9    10  10.1  10.2  10.3 
       66    55    51    54    78    57    49    72    49   249    57    68    54 
     10.4  10.5  10.6  10.7  10.8  10.9    11  11.1  11.2  11.3  11.4  11.5  11.6 
       62    56    52    50    56    51   217    62    47    51    49    71    50 
     11.7  11.8  11.9    12  12.1  12.2  12.3  12.4  12.5  12.6  12.7  12.8  12.9 
       55    58    39   209    46    54    53    40    54    42    45    62    50 
       13  13.1  13.2  13.3  13.4  13.5  13.6  13.7  13.8  13.9    14  14.1  14.2 
      190    50    47    55    45    61    53    49    45    40   177    47    43 
     14.3  14.4  14.5  14.6  14.7  14.8  14.9    15  15.1  15.2  15.3  15.4  15.5 
       38    42    49    38    57    46    48   138    40    43    50    38    45 
     15.6  15.7  15.8  15.9    16  16.1  16.2  16.3  16.4  16.5  16.6  16.7  16.8 
       47    46    39    38   129    35    34    37    36    36    32    35    29 
     16.9    17  17.1  17.2  17.3  17.4  17.5  17.6  17.7  17.8  17.9    18  18.1 
       34   105    28    33    23    46    36    31    29    31    26    79    27 
     18.2  18.3  18.4  18.5  18.6  18.7  18.8  18.9    19  19.1  19.2  19.3  19.4 
       28    37    21    32    25    48    35    23    88    36    34    29    28 
     19.5  19.6  19.7  19.8  19.9    20  20.1  20.2  20.3  20.4  20.5  20.6  20.7 
       31    21    27    31    20    85    24    21    20    23    26    22    18 
     20.8  20.9    21  21.1  21.2  21.3  21.4  21.5  21.6  21.7  21.8  21.9    22 
       33    24    70    17    26    26    24    31    13    23    20    24    62 
     22.1  22.2  22.3  22.4  22.5  22.6  22.7  22.8  22.9    23  23.1  23.2  23.3 
       15    21    24    21    27    31    23    27    12    68    20    25    20 
     23.4  23.5  23.6  23.7  23.8  23.9    24  24.1  24.2  24.3  24.4  24.5  24.6 
       14    18    29    19    20    13    55    17    20    11    13    24    23 
     24.7  24.8  24.9    25  25.1  25.2  25.3  25.4  25.5  25.6  25.7  25.8  25.9 
       15    17    13    40    14    12    19    12    25     9    22    15    23 
       26  26.1  26.2  26.3  26.4  26.5  26.6  26.7  26.8  26.9    27  27.1  27.2 
       31    17    24    21    15    29    11    20    25    12    48    16    12 
     27.3  27.4  27.5  27.6  27.7  27.8  27.9    28  28.1  28.2  28.3  28.4  28.5 
       24     3    11    15    11    13     8    42    14    11    11     8    16 
     28.6  28.7  28.8  28.9    29  29.1  29.2  29.3  29.4  29.5  29.6  29.7  29.8 
        7    12    10    11    21    13    12    12    10    13    11     9    18 
     29.9    30  30.1  30.2  30.3  30.4  30.5  30.6  30.7  30.8  30.9    31  31.1 
       12    23    11    14     7    12    11     9     7    11     6    24     9 
     31.2  31.3  31.4  31.5  31.6  31.7  31.8  31.9    32  32.1  32.2  32.3  32.4 
        3    13     7    15     8    12     8    12    31    10     2    10    15 
     32.5  32.6  32.7  32.8  32.9    33  33.1  33.2  33.3  33.4  33.5  33.6  33.7 
       12    12     8    10     5    30     6     9     9     8     5     9     3 
     33.8  33.9    34  34.1  34.2  34.3  34.4  34.5  34.6  34.7  34.8  34.9    35 
       12     4    31    10     9     8     7    11     6    11     5     5    20 
     35.1  35.2  35.3  35.4  35.5  35.6  35.7  35.8  35.9    36  36.1  36.2  36.3 
        6    13     1     7     6    13     6    15     8    17    10    10     9 
     36.4  36.5  36.6  36.7  36.8  36.9    37  37.1  37.2  37.3  37.4  37.5  37.6 
        3     8     6     7     7     8    15     3    12     7     6     2     4 
     37.7  37.8  37.9    38  38.1  38.2  38.3  38.4  38.5  38.6  38.7  38.8  38.9 
        6     3     4    25     9     9     2     4     4     4     6     9     3 
       39  39.1  39.2  39.3  39.4  39.5  39.6  39.7  39.8  39.9    40  40.1  40.2 
       13     6     6     7     4     9     9     5     5     6    16     5     5 
     40.3  40.4  40.5  40.6  40.7  40.8  40.9    41  41.1  41.2  41.3  41.4  41.5 
        5     2     9     7     6     5     3    16     4     6     6     5     6 
     41.6  41.7  41.8  41.9    42  42.1  42.2  42.3  42.4  42.5  42.6  42.7  42.8 
        5     8     4     4    22     2     6     7     3     4     1     7     9 
     42.9    43  43.1  43.2  43.3  43.4  43.5  43.6  43.7  43.8  43.9    44  44.1 
        8    18     3     1     2     5     7     3     6     4     3    10     9 
     44.2  44.3  44.4  44.5  44.6  44.7  44.8  44.9    45  45.1  45.2  45.3  45.4 
        4     7     2     6     4     5     4     3    16     4     7     5     2 
     45.5  45.6  45.7  45.8  45.9    46  46.1  46.2  46.3  46.4  46.5  46.6  46.7 
        5     3     3     5     1    17     2     1     7     4     4     4     4 
     46.8  46.9    47  47.1  47.2  47.3  47.4  47.5  47.6  47.7  47.8  47.9    48 
        5     4     7     4     3     6     3     5     3     5     4     2    12 
     48.1  48.3  48.4  48.5  48.7  48.8  48.9    49  49.1  49.2  49.3  49.4  49.5 
        6     2     7     2     6     4     5    10     1     5     1     7     5 
     49.6  49.7  49.9    50  50.1  50.2  50.3  50.4  50.5  50.6  50.7  50.8  50.9 
        2     6     2    13     2     1     4     2     5     1     3     1     2 
       51  51.1  51.2  51.3  51.4  51.5  51.6  51.7  51.8    52  52.1  52.2  52.3 
       10     2     5     9     3     1     4     3     3     2     4     1     5 
     52.4  52.5  52.6  52.7  52.8  52.9    53  53.1  53.2  53.3  53.4  53.5  53.6 
        3     4     4     1     4     3    12     2     6     3     3     4     6 
     53.7  53.9    54  54.1  54.3  54.4  54.5  54.6  54.7  54.8  54.9    55  55.1 
        4     1    10     1     2     6     1     4     2     4     1     3     6 
     55.2  55.3  55.4  55.6  55.7  55.8    56  56.1  56.3  56.5  56.6  56.7  56.8 
        6     4     3     2     1     1     8     2     5     3     3     3     3 
     56.9    57  57.1  57.2  57.3  57.4  57.5  57.6  57.7  57.8  57.9    58  58.1 
        4     9     2     3     3     6     2     2     5     1     1     7     2 
     58.2  58.4  58.5  58.6  58.7  58.8  58.9    59  59.2  59.3  59.4  59.5  59.6 
        3     2     3     5     1     3     2     7     4     1     2     3     2 
     59.7    60  60.2  60.5  60.8  60.9    61  61.1  61.4  61.6  61.7  61.8  61.9 
        5     6     1     1     2     2     8     1     1     3     2     3     2 
       62  62.1  62.2  62.3  62.5  62.6  62.7    63  63.3  63.5  63.6  63.7  63.8 
        9     2     1     1     1     2     1     7     1     1     1     2     1 
     63.9    64  64.1  64.3  64.4  64.5  64.6  64.7  64.8  64.9    65  65.1  65.2 
        4     7     2     1     3     1     1     2     2     1     7     3     2 
     65.3  65.4  65.7    66  66.2  66.3  66.6  66.8  66.9    67  67.2  67.3  67.6 
        2     1     1     7     3     5     1     2     1     3     1     1     2 
     67.7  67.8  67.9    68  68.1  68.2  68.6  68.7  68.8    69  69.1  69.2  69.6 
        3     2     1     5     2     1     1     4     2     6     1     1     3 
     69.7  69.9    70  70.1  70.2  70.3  70.5  70.8  70.9    71  71.1  71.2  71.3 
        2     2     2     1     1     1     1     1     2     3     3     1     1 
     71.7  71.8  71.9    72  72.3  72.4  72.9    73  73.1  73.2  73.5  73.6  73.9 
        1     3     1     3     1     2     1     5     2     2     1     1     2 
       74  74.4  74.7    75  75.1  75.3  75.5  75.7  75.8    76  76.1  76.3  76.4 
        2     1     1     2     2     1     1     1     1     4     1     1     2 
     76.6  76.7  76.8  76.9    77  77.2  77.4  77.6    78  78.1  78.3  78.5  79.3 
        1     1     1     1     3     2     1     1     1     2     1     1     1 
     79.5    80  80.3  80.4  80.7  80.9    81  81.1  81.3  81.6    82  82.1    83 
        1     2     1     1     2     1     1     1     1     1     3     1     2 
     83.1  83.9    84  84.1  84.2  84.4  84.6    85  85.3  85.6  85.7    86  86.2 
        1     1     2     1     2     1     1     1     1     2     1     1     1 
     86.6    87  87.4  87.5    88  88.6  89.3  89.6  89.8  90.7    91  91.7  92.5 
        1     2     1     1     1     1     1     1     1     1     2     1     1 
     93.9 102.7 104.3 
        1     1     1 
    table(twotwo$`Daily Mean PM2.5 Concentration`)
    
     -2.2    -2  -1.9  -1.7  -1.5  -1.4  -1.3  -1.2  -1.1    -1  -0.9  -0.8  -0.7 
        1     1     2     1     1     6     5     4     4    11     4    12     8 
     -0.6  -0.5  -0.4  -0.3  -0.2  -0.1     0   0.1   0.2   0.3   0.4   0.5   0.6 
       17    18    24    25    35    39   117    49    87    95    82   152   133 
      0.7   0.8   0.9     1   1.1   1.2   1.3   1.4   1.5   1.6   1.7   1.8   1.9 
      168   164   147   257   172   250   202   214   307   269   311   289   300 
        2   2.1   2.2   2.3   2.4   2.5   2.6   2.7   2.8   2.9     3   3.1   3.2 
      463   338   383   349   361   461   371   498   381   404   592   454   568 
      3.3   3.4   3.5   3.6   3.7   3.8   3.9     4   4.1   4.2   4.3   4.4   4.5 
      464   440   579   505   549   474   463   691   475   602   492   488   594 
      4.6   4.7   4.8   4.9     5   5.1   5.2   5.3   5.4   5.5   5.6   5.7   5.8 
      518   654   476   475   700   480   613   459   481   609   489   591   502 
      5.9     6   6.1   6.2   6.3   6.4   6.5   6.6   6.7   6.8   6.9     7   7.1 
      428   665   437   568   457   458   560   442   552   452   385   544   418 
      7.2   7.3   7.4   7.5   7.6   7.7   7.8   7.9     8   8.1   8.2   8.3   8.4 
      518   397   399   508   415   483   344   363   488   353   459   375   324 
      8.5   8.6   8.7   8.8   8.9     9   9.1   9.2   9.3   9.4   9.5   9.6   9.7 
      448   356   436   356   352   398   323   395   362   334   381   315   382 
      9.8   9.9    10  10.1  10.2  10.3  10.4  10.5  10.6  10.7  10.8  10.9    11 
      286   286   373   279   337   279   274   314   266   295   226   253   296 
     11.1  11.2  11.3  11.4  11.5  11.6  11.7  11.8  11.9    12  12.1  12.2  12.3 
      263   299   231   204   280   205   250   220   194   243   189   242   199 
     12.4  12.5  12.6  12.7  12.8  12.9    13  13.1  13.2  13.3  13.4  13.5  13.6 
      180   204   169   206   152   174   214   154   226   170   159   178   133 
     13.7  13.8  13.9    14  14.1  14.2  14.3  14.4  14.5  14.6  14.7  14.8  14.9 
      176   112   140   167   133   141   142   121   116   117   130   139   125 
       15  15.1  15.2  15.3  15.4  15.5  15.6  15.7  15.8  15.9    16  16.1  16.2 
      148   102   113   102    98   134   101   135    93    92   117    87    78 
     16.3  16.4  16.5  16.6  16.7  16.8  16.9    17  17.1  17.2  17.3  17.4  17.5 
       81    81   112    79    92    81    73    99    64   107    74    73    76 
     17.6  17.7  17.8  17.9    18  18.1  18.2  18.3  18.4  18.5  18.6  18.7  18.8 
       75    76    58    41    88    46    63    44    43    61    56    66    44 
     18.9    19  19.1  19.2  19.3  19.4  19.5  19.6  19.7  19.8  19.9    20  20.1 
       59    55    36    56    33    42    47    42    34    35    29    52    28 
     20.2  20.3  20.4  20.5  20.6  20.7  20.8  20.9    21  21.1  21.2  21.3  21.4 
       47    43    31    52    36    41    30    34    47    31    36    38    28 
     21.5  21.6  21.7  21.8  21.9    22  22.1  22.2  22.3  22.4  22.5  22.6  22.7 
       24    33    45    31    25    28    41    40    18    23    32    25    31 
     22.8  22.9    23  23.1  23.2  23.3  23.4  23.5  23.6  23.7  23.8  23.9    24 
       29    20    27    17    31    23    24    24    21    25    16    20    24 
     24.1  24.2  24.3  24.4  24.5  24.6  24.7  24.8  24.9    25  25.1  25.2  25.3 
       26    15    18    16    17    19    17    19    15    13    18    24    22 
     25.4  25.5  25.6  25.7  25.8  25.9    26  26.1  26.2  26.3  26.4  26.5  26.6 
       19    23    15    19    11    17    12    13    14    22    17    16    18 
     26.7  26.8  26.9    27  27.1  27.2  27.3  27.4  27.5  27.6  27.7  27.8  27.9 
       19    15    14    19    16    11    13    16    21    14    14    12    10 
       28  28.1  28.2  28.3  28.4  28.5  28.6  28.7  28.8  28.9    29  29.1  29.2 
       14    14    10    12    10    20     9    14    14    11    14     9    18 
     29.3  29.4  29.5  29.6  29.7  29.8  29.9    30  30.1  30.2  30.3  30.4  30.5 
       11    12    10    12     8     9     7     6    16    10     5     7     7 
     30.6  30.7  30.8  30.9    31  31.1  31.2  31.3  31.4  31.5  31.6  31.7  31.8 
       11     8    11    10    16     6    13     8     4     9     9     9     9 
     31.9    32  32.1  32.2  32.3  32.4  32.5  32.6  32.7  32.8  32.9    33  33.1 
       11    11     4     4     8     3     7     5    10     7     8     6    13 
     33.2  33.3  33.4  33.5  33.6  33.7  33.8  33.9    34  34.1  34.2  34.3  34.4 
        7     6     6    18     5     5    11     4     6     7     4     3     6 
     34.5  34.6  34.7  34.8  34.9    35  35.1  35.2  35.3  35.4  35.5  35.6  35.7 
        6     8     5     5     6     9     4     7     3     3    12     3     3 
     35.8  35.9    36  36.1  36.2  36.3  36.4  36.5  36.6  36.7  36.8    37  37.1 
        2     9     9     4     9     4     4     6     2     3     5    10     5 
     37.2  37.3  37.4  37.5  37.6  37.7  37.8  37.9    38  38.1  38.2  38.3  38.4 
        2     5     3     4     5     4     4     3     5     5     5     1     4 
     38.5  38.6  38.7  38.8  38.9    39  39.1  39.2  39.3  39.4  39.5  39.6  39.7 
        5     3     3     6     1     4     5     4     4     3     9     3     5 
     39.8  39.9    40  40.1  40.2  40.3  40.4  40.5  40.6  40.7  40.8  40.9    41 
        5     1     3     4     2     2     1     4     4     7     8     2     2 
     41.1  41.2  41.3  41.4  41.5  41.6  41.7  41.8  41.9    42  42.1  42.2  42.3 
        1     5     1     2     3     3     2     4     2     3     3     4     2 
     42.4  42.5  42.6  42.7  42.8    43  43.1  43.2  43.3  43.4  43.5  43.6  43.7 
        2     1     1     4     1     2     5     4     2     3     5     1     4 
     43.8  43.9    44  44.1  44.2  44.3  44.4  44.5  44.8    45  45.2  45.4  45.5 
        2     1     5     2     2     1     1     3     2     1     1     1     1 
     45.7  45.9  46.1  46.2  46.3  46.6  46.7  46.8  46.9    47  47.1  47.4  47.5 
        2     1     3     4     3     1     4     1     1     1     1     1     1 
     47.8  47.9    48  48.2  48.3  48.5  48.6  48.7  48.9    49  49.1  49.2  49.4 
        1     1     2     2     1     1     1     3     1     3     1     1     1 
     49.7  49.8    50  50.2  50.5  50.8  51.2  51.4  51.5  51.8  51.9  52.2  52.5 
        1     1     2     1     2     1     2     1     1     1     1     1     1 
     52.6  52.8  52.9    53  53.2  53.3  53.5  53.6  53.8  53.9    54  54.5  54.6 
        4     1     3     1     1     1     1     1     1     2     1     1     2 
     54.7  55.6  55.8    56  56.3  57.8  58.1  58.6    59  59.3    60  61.5  62.3 
        3     1     1     1     1     1     1     2     2     1     1     1     1 
     62.4  62.5  62.8  62.9    63  63.7    64  64.2  64.4  66.2  66.6  66.7  68.6 
        1     1     1     1     1     1     2     1     1     2     1     1     2 
       69  69.2    70  70.7    73  73.5  73.8    74  75.3  75.5  76.3  77.2  77.5 
        1     1     1     1     3     2     1     2     2     1     1     1     1 
       78    81  81.3  83.5  83.6  83.9  84.4  84.5  85.2  88.6  89.2  89.8  90.7 
        1     1     1     1     1     1     1     1     1     1     1     2     1 
     91.4  91.9  92.4  96.6  97.2  98.2 101.4 102.3   103   105 106.4 107.2   108 
        1     1     1     2     1     1     1     1     1     1     1     1     1 
    108.8 109.5 110.2 111.1 111.6 113.6 113.7 118.7   122 131.7 133.8 139.2 141.1 
        1     1     1     1     1     1     1     1     1     1     1     1     1 
    141.8 150.9 155.2 168.7   169 177.1 178.6 181.7 212.8 218.2 228.3 237.5 243.9 
        1     1     1     1     1     1     1     1     1     1     1     1     1 
    244.7 246.2 296.3 302.5 
        1     1     1     1 
    summary(Otwo$`Daily Mean PM2.5 Concentration`)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
       0.00    7.00   12.00   16.12   20.50  104.30 
    summary(twotwo$`Daily Mean PM2.5 Concentration`)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     -2.200   4.200   7.000   8.574  10.900 302.500 
    twotwo <- twotwo[twotwo$`Daily Mean PM2.5 Concentration` >= 0,]
    mean(is.na(twotwo$CBSA_CODE))
    [1] 0.08237052
    mean(is.na(Otwo$CBSA_CODE))
    [1] 0.05814972

    Both the data sets have 20 columns which means they should have the same variables across both years. However, the 2022 data set has a lot more rows than the 2002 data set, meaning there are a lot more days of data recorded and/or more sites added in this years collection of data. The sumary of the two sets showed that the min and max were both at extremes with the 2022 data showing a negative value (-2.2) for the minimum while the maximum was extremely high (302.5). 2002 showed a somewhat similar trend with a minimum of 0 and a maximum of 104.3. Upon further research, it showed that negative pm2.5 concentrations are invalid but it is possible to have those high values so I made the decision to eliminate the less than 0 values and keep the higher ones. Some of the CBSA code values were missing but I decided to keep them since they were not relevant to the study we are trying to be done for this assignment and all the other relevant values were recorded.

  2. Combine the two years of data into one data frame. Use the Date variable to create a new column for year, which will serve as an identifier. Change the names of the key variables so that they are easier to refer to in your code.

    library(dplyr)
    
    Attaching package: 'dplyr'
    The following objects are masked from 'package:data.table':
    
        between, first, last
    The following objects are masked from 'package:stats':
    
        filter, lag
    The following objects are masked from 'package:base':
    
        intersect, setdiff, setequal, union
    merged <- rbind(Otwo, twotwo)
    merged$Date <- as.Date(merged$Date, format = "%m/%d/%Y")
    merged$Year <- as.integer(format(merged$Date, "%Y"))
    merged <- merged %>%
         select(Year, everything())
    newname <- 'PM2.5'
    colnames(merged)[colnames(merged) == 'Daily Mean PM2.5 Concentration'] <- newname
    newname <- 'AQI'
    colnames(merged)[colnames(merged) == 'DAILY_AQI_VALUE'] <- newname
    newname <- 'OBS'
    colnames(merged)[colnames(merged) == 'DAILY_OBS_COUNT'] <- newname
    newname <- 'Latitude'
    colnames(merged)[colnames(merged) == 'SITE_LATITUDE'] <- newname
    newname <- 'Longitude'
    colnames(merged)[colnames(merged) == 'SITE_LONGITUDE'] <- newname
    newname <- 'SiteName'
    colnames(merged)[colnames(merged) == 'Site Name'] <- newname
  3. Create a basic map in leaflet() that shows the locations of the sites (make sure to use different colors for each year). Summarize the spatial distribution of the monitoring sites.

    library(leaflet)
    leaflet() %>%
             addProviderTiles('CartoDB.Positron') %>%
             addCircles(
                     data = merged,
                     lat = ~Latitude, lng = ~Longitude,
                     opacity = 1, fillOpacity = 1, radius = 400,
                     color = ifelse(merged$Year == 2002, "blue", "red")
                 ) %>%
            addLegend(
            "bottomright",
            colors = c("blue", "red"),
            labels = c("2002", "2022"),
            opacity = 1
      )

    There seems to not be as much blue due to the discrepancy in amount of cases in 2002 vs. 2022 but from what I can see, the blues (2002) are more along the coast of California and the Eastern border of California for some reason. Whereas the reds (2022) are a lot more spread out and dominating the state in this leaflet map, showing a big increase in sites in the 20 years passed since the 2002 data set.

  4. Check for any missing or implausible values of PM2.5 in the combined dataset. Explore the proportions of each and provide a summary of any temporal patterns you see in these observations.

    summary(merged$PM2.5)
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
       0.00    4.60    7.70   10.24   12.50  302.50 
    hist(merged$PM2.5)

    There were no missing values of pm2.5 however there were a couple of outliers in the extreme high side since the merged data set had a 1st quartile value of 4.6, median of 7.7, and a 3rd quartile value of 12.5 but the maximum is 302.5, suggesting a majority of the data is in the lower number values. To explore this pattern, I made a histogram and a heavy amount of the data (over 80%) lies in the below 25 pm2.5 region with some going from 25-60, and then the outliers barely showing up on the histogram since there are about 15-20 of them in the “hundreds” region compared to the dataset consisting of 73,533 values.

  5. Explore the main question of interest at three different spatial levels (whether daily concentrations of PM2.5 (particulate matter air pollution with aerodynamic diameter less than 2.5 \(\mu\)m) have decreased in California over the last 20 years (from 2002 to 2022).

    Create exploratory plots (e.g. boxplots, histograms, line plots) and summary statistics that best suit each level of data. Be sure to write up explanations of what you observe in these data.

    • state

    • county

    • site in Los Angeles

library(ggplot2)
pm25county <- merged %>%
          filter(COUNTY == 'Calaveras' & (Year == 2002 | Year == 2022))
ggplot(data = pm25county, aes(x = PM2.5, fill = as.factor(Year))) +
    geom_histogram(binwidth = 2, position = "dodge") +
    labs(x = "PM2.5", y = "Frequency", fill = "Year",
         title = paste("PM2.5 rates in Calaveras for 2002 vs 2022")) +
    theme_minimal()

pm25site <- merged %>%
  filter(SiteName == 'Los Angeles-North Main Street' & (Year == 2002 | Year == 2022))
ggplot(data = pm25site, aes(x = Year, y = PM2.5, group = 1)) +
  geom_line() +
  labs(x = "Year", y = "PM2.5",
       title = paste("PM2.5 rates in Los Angeles-N. Main St. for 2002 and 2022")) +
  theme_minimal()

pm25state <- merged %>%
  filter(Year %in% c(2002, 2022))
ggplot(data = pm25state, aes(x = as.factor(Year), y = PM2.5)) +
  geom_boxplot(fill = "lightblue", color = "red") +
  labs(x = "Year", y = "PM2.5",
       title = "Box Plot of PM2.5 levels in California for 2002 and 2022") +
    scale_y_continuous(breaks = seq(0, max(merged$PM2.5), by = 50)) +
  theme_minimal()

In Calaveras County, the PM2.5 rates have relatively decreased since the higher values near 30 and 40 pm2.5 are from 2002 whereas the highest 2022 shows is about 27 pm2.5. There is a much higher volume of data for 2022 but a majority of the data is below 12 pm2.5, which is considered healthy by most studies.

In the LA N. Main St. Site, the PM2.5 rates have relatively decreased a lot since the higher values are over 60 and from 2002 whereas the highest 2022 shows is about 38 pm2.5. There is a much higher volume of data for 2022, as stated before.

In the state of California, the PM2.5 rates have increased, in terms of maximum values, since the higher values are higher than 150 pm2.5 and go all the way to over 300 pm2.5 in 2022. However, if you look at the interquartile range depicted in the just the box (excluding the extreme values), the Q1, Q3, and median values are all lower in 2022 than they were in 2002. So, in terms of overall general data, the PM2.5 rates in California in 2022 have decreased since 2002.